editing disabled

We are going to use a workflow with the following goals:
  1. Repeatability. We need to be sure that if we (or someone else) re-runs the analyses we will get the same results.
  2. Ease. We want to minimize the overhead of re-running or revising our analyses. If the data changes, it should be a simple matter of re-running our scripts.
  3. Transparency & organization. We want to be sure that someone else (e.g., our future selves) can look at our files and code and understand what we did and why. Similarly, we want to make sure that we know what each file is. (There is nothing worse than trying to figure out whether XXX_Final_V2_final.R or XXX_Final_V2_final.reallyfinal.R is actually the one to use!)

In order to accomplish this, we're going to follow a basic structure and rules. In essence, we are going to organize our work into projects with specific types of files within. You can create a new project (ideally in a new directory) under the Project / Create Project menu. From there on out, you need only open the *.Rproj file to open up a unique working directory and load up all files that are part of the project. (Later we can use version control on this directory, too, which is incredibly handy.)

Each project will be comprised of
  • Your original, clean data (in *.csv format). Call this XXX_OriginalData.csv. Once this is clean and without issue, it does not change!
  • A data cleaning script (XXX_Cleaning.R) in which you clean and organize your data, creating whatever new data files you might need for your actual analyses. Here you will make sure your data are of the appropriate type and range, that it all makes sense, that it is organized in the right format (long vs. wide), and you merge or chop up your data into the files you need for analyses. If, during your analyses, you realize that you need your data organized in a new fashion or whatever, you will come back and modify this file. (Note: for simple analyses this script might be very short and it might not create new files, but you still want it.)
  • Whatever newly created data files (usually in *.csv format) you produced in the data cleaning script for your analyses. These can change during the analyses.
  • An analysis script in which you perform the analyses on your data (XXX_Analyses.R). This is the meat of the thing. Focus on just what you need to know. Get rid of the extraneous, exploratory code. I just mucks up things. (If you really want to keep those things, create a separate script and label it appropriately.)
  • Any functions or sub-scripts you need or create for your analyses.
  • An (optional) report document(XXX_Report.Rmd) in which you can merge your prose and R code and output, including graphs. In this class we will use R markdown files, which we can weave into self-contained html files. Later on, you may want to venture down the path of LaTex and sweave() or knitr().
    • In this class, you will be required to send me the resulting html file and/or your R markdown file. This way I can see your code as well as read what you have to say.

Rules:
  • You will keep all of your code in scripts!!!
  • You will comment your code, starting each file with the list of goals of your analyses, the data required, and the approach you will use. Keep this up to date. Write the comments as if the person reading it has next to zero understanding of what you're going on about. Trust me, after a couple of years you won't!
  • You will keep your code organized and easy to read (e.g., nicely indented, with common formatting, etc.)
  • You will follow this style guide to naming variables, functions, etc. (In essence, use short, descriptive names; separate words in a name with an underscore; use whitespace to clear up operators such as " + ")
  • You will not get hung up keeping different versions of each file (e.g., XXX_Analyses_2013_Jan_12.R, etc.). For the first several weeks, we will have little reason to keep track of changes. You will just want the most recent, most correct version. Later, we will introduce version control, which keeps track of different versions for you in a much more organized, efficient, and useful manner.
  • You will have one folder per project. For instance, I have one folder and project for this class. In it are all of the data files and scripts for each lab. For my own data I will have one folder/project for a manuscript I'm working on or, sometimes, one folder/project for a dataset that forms the basis for multiple manuscripts. In this class, I would recommend you have one folder/project for the class and lab stuff, and another for your independent project.